11 research outputs found

    Historical Document Image Segmentation with LDA-Initialized Deep Neural Networks

    Full text link
    In this paper, we present a novel approach to perform deep neural networks layer-wise weight initialization using Linear Discriminant Analysis (LDA). Typically, the weights of a deep neural network are initialized with: random values, greedy layer-wise pre-training (usually as Deep Belief Network or as auto-encoder) or by re-using the layers from another network (transfer learning). Hence, many training epochs are needed before meaningful weights are learned, or a rather similar dataset is required for seeding a fine-tuning of transfer learning. In this paper, we describe how to turn an LDA into either a neural layer or a classification layer. We analyze the initialization technique on historical documents. First, we show that an LDA-based initialization is quick and leads to a very stable initialization. Furthermore, for the task of layout analysis at pixel level, we investigate the effectiveness of LDA-based initialization and show that it outperforms state-of-the-art random weight initialization methods.Comment: 5 page

    DeepDIVA: A Highly-Functional Python Framework for Reproducible Experiments

    Full text link
    We introduce DeepDIVA: an infrastructure designed to enable quick and intuitive setup of reproducible experiments with a large range of useful analysis functionality. Reproducing scientific results can be a frustrating experience, not only in document image analysis but in machine learning in general. Using DeepDIVA a researcher can either reproduce a given experiment with a very limited amount of information or share their own experiments with others. Moreover, the framework offers a large range of functions, such as boilerplate code, keeping track of experiments, hyper-parameter optimization, and visualization of data and results. To demonstrate the effectiveness of this framework, this paper presents case studies in the area of handwritten document analysis where researchers benefit from the integrated functionality. DeepDIVA is implemented in Python and uses the deep learning framework PyTorch. It is completely open source, and accessible as Web Service through DIVAServices.Comment: Submitted at the 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), 6 pages, 6 Figure

    Survey of Artificial Intelligence for Card Games and Its Application to the Swiss Game Jass

    Full text link
    In the last decades we have witnessed the success of applications of Artificial Intelligence to playing games. In this work we address the challenging field of games with hidden information and card games in particular. Jass is a very popular card game in Switzerland and is closely connected with Swiss culture. To the best of our knowledge, performances of Artificial Intelligence agents in the game of Jass do not outperform top players yet. Our contribution to the community is two-fold. First, we provide an overview of the current state-of-the-art of Artificial Intelligence methods for card games in general. Second, we discuss their application to the use-case of the Swiss card game Jass. This paper aims to be an entry point for both seasoned researchers and new practitioners who want to join in the Jass challenge

    A Comprehensive Study of ImageNet Pre-Training for Historical Document Image Analysis

    Full text link
    Automatic analysis of scanned historical documents comprises a wide range of image analysis tasks, which are often challenging for machine learning due to a lack of human-annotated learning samples. With the advent of deep neural networks, a promising way to cope with the lack of training data is to pre-train models on images from a different domain and then fine-tune them on historical documents. In the current research, a typical example of such cross-domain transfer learning is the use of neural networks that have been pre-trained on the ImageNet database for object recognition. It remains a mostly open question whether or not this pre-training helps to analyse historical documents, which have fundamentally different image properties when compared with ImageNet. In this paper, we present a comprehensive empirical survey on the effect of ImageNet pre-training for diverse historical document analysis tasks, including character recognition, style classification, manuscript dating, semantic segmentation, and content-based retrieval. While we obtain mixed results for semantic segmentation at pixel-level, we observe a clear trend across different network architectures that ImageNet pre-training has a positive effect on classification as well as content-based retrieval

    Improving Reproducible Deep Learning Workflows with DeepDIVA

    Full text link
    The field of deep learning is experiencing a trend towards producing reproducible research. Nevertheless, it is still often a frustrating experience to reproduce scientific results. This is especially true in the machine learning community, where it is considered acceptable to have black boxes in your experiments. We present DeepDIVA, a framework designed to facilitate easy experimentation and their reproduction. This framework allows researchers to share their experiments with others, while providing functionality that allows for easy experimentation, such as: boilerplate code, experiment management, hyper-parameter optimization, verification of data integrity and visualization of data and results. Additionally, the code of DeepDIVA is well-documented and supported by several tutorials that allow a new user to quickly familiarize themselves with the framework

    Cross-Depicted Historical Motif Categorization and Retrieval with Deep Learning

    No full text
    In this paper, we tackle the problem of categorizing and identifying cross-depicted historical motifs using recent deep learning techniques, with aim of developing a content-based image retrieval system. As cross-depiction, we understand the problem that the same object can be represented (depicted) in various ways. The objects of interest in this research are watermarks, which are crucial for dating manuscripts. For watermarks, cross-depiction arises due to two reasons: (i) there are many similar representations of the same motif, and (ii) there are several ways of capturing the watermarks, i.e., as the watermarks are not visible on a scan or photograph, the watermarks are typically retrieved via hand tracing, rubbing, or special photographic techniques. This leads to different representations of the same (or similar) objects, making it hard for pattern recognition methods to recognize the watermarks. While this is a simple problem for human experts, computer vision techniques have problems generalizing from the various depiction possibilities. In this paper, we present a study where we use deep neural networks for categorization of watermarks with varying levels of detail. The macro-averaged F1-score on an imbalanced 12 category classification task is 88.3 %, the multi-labelling performance (Jaccard Index) on a 622 label task is 79.5 %. To analyze the usefulness of an image-based system for assisting humanities scholars in cataloguing manuscripts, we also measure the performance of similarity matching on expert-crafted test sets of varying sizes (50 and 1000 watermark samples). A significant outcome is that all relevant results belonging to the same super-class are found by our system (Mean Average Precision of 100%), despite the cross-depicted nature of the motifs. This result has not been achieved in the literature so far

    Combining graph edit distance and triplet networks for offline signature verification

    No full text
    Offline signature verification is a challenging pattern recognition task where a writer model is inferred using only a small number of genuine signatures. A combination of complementary writer models can make it more difficult for an attacker to deceive the verification system. In this work, we propose to combine a recent structural approach based on graph edit distance with a statistical approach based on deep triplet networks. The combination of the structural and statistical models achieve significant improvements in performance on four publicly available benchmark datasets, highlighting their complementary perspectives
    corecore